Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 #2843

Closed
wants to merge 4 commits into from

Conversation

liancheng
Copy link
Contributor

This PR overrides the GetInfo Hive Thrift API to provide correct Spark version information. Another property spark.sql.hive.version is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. Also took the chance to remove the SET -v hack, which was a workaround for Simba ODBC driver connectivity.

TODO

  • Find a general way to figure out Hive (or even any dependency) version.

    This blog post suggests several methods to inspect application version. In the case of Spark, this can be tricky because the chosen method:

    1. must applies to both Maven build and SBT build

      For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds.

    2. must not rely on the original jars of dependencies to extract specific dependency version, because Spark uses assembly jar.

      This implies we can't read Hive version from Hive jar files since standard Spark distribution doesn't include them.

    3. should play well with SPARK_PREPEND_CLASSES to ease local testing during development.

      SPARK_PREPEND_CLASSES prevents classes to be loaded from the assembly jar, thus we can't locate the jar file and read its manifest.

    Given these, maybe the only reliable method is to generate a source file containing version information at build time. @pwendell Do you have any suggestions from the perspective of the build process?

Update Hive version is now retrieved from the newly introduced HiveShim object.

.setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}"))
val sparkConf = new SparkConf()
.setAppName(s"SparkSQL::${java.net.InetAddress.getLocalHost.getHostName}")
.set("spark.sql.hive.version", "0.12.0-protobuf-2.5")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to be generalized.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have started for PR 2843 at commit 9799b50.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have finished for PR 2843 at commit 9799b50.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21881/
Test FAILed.

@@ -37,35 +43,81 @@ import org.apache.spark.sql.catalyst.util.getTempFilePath

/**
* Tests for the HiveThriftServer2 using JDBC.
*
* NOTE: SPARK_PREPEND_CLASSES is explicitly disabled in this test suite. Assembly jar must be
* rebuilt after changing HiveThriftServer2 related code.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement should be OK for Jenkins, since Jenkins always build the assembly jar before executing any test suites.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have started for PR 2843 at commit 9799b50.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have started for PR 2843 at commit 9799b50.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have finished for PR 2843 at commit 9799b50.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have started for PR 2843 at commit da5e716.

  • This patch merges cleanly.

@liancheng
Copy link
Contributor Author

Hm, 3 consecutive random build failures, embarrassing...

For the first one, unit tests are not started at all, seems that the build process was interrupted somehow. The second failure is bit weird, although we're already using random port to avoid port conflict, it still failed to open the listening port. Checked the TCP port range in Jenkins master node, which should be valid. But I don't have access to the Jenkins slave node that executed this build. The cause of the third failure is a known bug fixed in the master branch, just rebased to the most recent master.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

Tests timed out for PR 2843 at commit 9799b50 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Oct 19, 2014

QA tests have finished for PR 2843 at commit da5e716.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SerializableMapWrapper[A, B](underlying: collection.Map[A, B])
    • class Predict(
    • case class EvaluatePython(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21887/
Test PASSed.

}

sql(s"SET ${testKey + testKey}=${testVal + testVal}")
assert(hiveconf.get(testKey + testKey, "") == testVal + testVal)
assertResult(Set(testKey -> testVal, (testKey + testKey) -> (testVal + testVal))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are removed because they were originally for testing the deprecated hql call. At that time sql and hql have different code paths. Later on those hql calls were changed to sql to avoid compile time deprecation warning, and this makes them absolutely duplicated code.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22610 has started for PR 2843 at commit 2e5aa55.

  • This patch merges cleanly.

@liancheng
Copy link
Contributor Author

Updated Hive version information inspection. Waiting for #2685 and #2887 to be merged, then this should be ready to go after rebasing.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22610 has finished for PR 2843 at commit 2e5aa55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22610/
Test PASSed.

@liancheng
Copy link
Contributor Author

retest this please

@liancheng
Copy link
Contributor Author

@marmbrus This should be ready to go once Jenkins says OK. Simba ODBC driver needs this change for the SQLGetInfo ODBC API.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22661 has started for PR 2843 at commit 2e5aa55.

  • This patch merges cleanly.

@liancheng liancheng changed the title [SPARK-3791][SQL][WIP] Provides Spark version and Hive version in HiveThriftServer2 [SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 Oct 31, 2014
@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22661 has finished for PR 2843 at commit 2e5aa55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22661/
Test FAILed.

@liancheng
Copy link
Contributor Author

Fixed failed tests and rebased to the most recent master (with full Hive 0.13.1 support).

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22691 has started for PR 2843 at commit aebb848.

  • This patch merges cleanly.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 1, 2014

Can you please rebase?

@liancheng
Copy link
Contributor Author

Done rebasing.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22728 has started for PR 2843 at commit a873d0f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22728 has finished for PR 2843 at commit a873d0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22728/
Test FAILed.

@liancheng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22748 has started for PR 2843 at commit a873d0f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22748 has finished for PR 2843 at commit a873d0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22748/
Test FAILed.

@liancheng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22759 has started for PR 2843 at commit a873d0f.

  • This patch merges cleanly.

@liancheng
Copy link
Contributor Author

The previous test failure are caused by the flaky CliSuite. A fix has been proposed in #3060.

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22759 has finished for PR 2843 at commit a873d0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22759/
Test PASSed.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 2, 2014

Thanks! Merged to master.

@asfgit asfgit closed this in c9f8400 Nov 2, 2014
@liancheng liancheng deleted the get-info branch November 3, 2014 01:49
@liancheng liancheng restored the get-info branch November 5, 2014 08:12
marmbrus pushed a commit to marmbrus/spark that referenced this pull request Nov 11, 2014
This PR backports apache#2843 to branch-1.1. The key difference is that this one doesn't support Hive 0.13.1 and thus always returns `0.12.0` when `spark.sql.hive.version` is queried.

6 other commits on which apache#2843 depends were also backported, they are:

- apache#2887 for `SessionState` lifecycle control
- apache#2675, apache#2823 & apache#3060 for major test suite refactoring and bug fixes
- apache#2164, for Parquet test suites updates
- apache#2493, for reading `spark.sql.*` configurations

Author: Cheng Lian <[email protected]>
Author: Cheng Lian <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#3113 from liancheng/get-info-for-1.1 and squashes the following commits:

d354161 [Cheng Lian] Provides Spark and Hive version in HiveThriftServer2 for branch-1.1
0c2a244 [Michael Armbrust] [SPARK-3646][SQL] Copy SQL configuration from SparkConf when a SQLContext is created.
3202a36 [Michael Armbrust] [SQL] Decrease partitions when testing
7f395b7 [Cheng Lian] [SQL] Fixes race condition in CliSuite
0dd28ec [Cheng Lian] [SQL] Fixes the race condition that may cause test failure
5928b39 [Cheng Lian] [SPARK-3809][SQL] Fixes test suites in hive-thriftserver
faeca62 [Cheng Lian] [SPARK-4037][SQL] Removes the SessionState instance created in HiveThriftServer2
@liancheng liancheng deleted the get-info branch November 21, 2014 04:22
asfgit pushed a commit that referenced this pull request Nov 10, 2017
…perty

## What changes were proposed in this pull request?

At the beginning #2843 added `spark.sql.hive.version` to reveal underlying hive version for jdbc connections. For some time afterwards, it was used as a version identifier for the execution hive client.

Actually there is no hive client for executions in spark now and there are no usages of HIVE_EXECUTION_VERSION found in whole spark project. HIVE_EXECUTION_VERSION is set by `spark.sql.hive.version`, which is still set internally in some places or by users, this may confuse developers and users with HIVE_METASTORE_VERSION(spark.sql.hive.metastore.version).

It might better to be removed.

## How was this patch tested?

modify some existing ut

cc cloud-fan gatorsmile

Author: Kent Yao <[email protected]>

Closes #19712 from yaooqinn/SPARK-22487.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants